Hardware accelerators have become a de-facto standard to achieve highperformance on current supercomputers and there are indications that this trendwill increase in the future. Modern accelerators feature high-bandwidth memorynext to the computing cores. For example, the Intel Knights Landing (KNL)processor is equipped with 16 GB of high-bandwidth memory (HBM) that workstogether with conventional DRAM memory. Theoretically, HBM can provide 5xhigher bandwidth than conventional DRAM. However, many factors impact theeffective performance achieved by applications, including the applicationmemory access pattern, the problem size, the threading level and the actualmemory configuration. In this paper, we analyze the Intel KNL system andquantify the impact of the most important factors on the applicationperformance by using a set of applications that are representative ofscientific and data-analytics workloads. Our results show that applicationswith regular memory access benefit from MCDRAM, achieving up to 3x performancewhen compared to the performance obtained using only DRAM. On the contrary,applications with random memory access pattern are latency-bound and may sufferfrom performance degradation when using only MCDRAM. For those applications,the use of additional hardware threads may help hide latency and achieve higheraggregated bandwidth when using HBM.
展开▼